Benchmarking Distributed Stream Processing Engines

نویسندگان

  • Jeyhun Karimov
  • Tilmann Rabl
  • Asterios Katsifodimos
  • Roman Samarev
  • Henri Heiskanen
  • Volker Markl
چکیده

Over the last years, stream data processing has been gaining atten— tion both in industry and in academia due to its wide range of appli— cations. To fulfill the need for scalable and efficient stream analyt— ics, numerous open source stream data processing systems (SDPSs) have been developed, with high throughput and low latency being their key performance targets. In this paper, we propose a frame— work to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations. For this benchmark, we design workloads based on real—life, industrial use—cases. The main contribution of this work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we completely separate the system under test and driver, so that the measurement results are closer to actual system performance under real conditions. Third, we build the first driver to test the actual sustainable performance of a system under test. Our detailed evaluation highlights that there is no single winner, but rather, each system excels in individual use—cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Benchmarking Framework for Stream Processors

Stream Processing/Reasoning, an active research topic [5], has been picked up by different communities which developed a diversity of stream processors/reasoners. This however makes empirical evaluation and comparison of these engines a non-trivial task [4]. Different classes of those engines work on different formats of input data, use different languages to formulate queries, evaluate these q...

متن کامل

Benchmarking Distributed Stream Data Processing Systems

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems’ performance characteristics. In this paper, we propose a ...

متن کامل

Towards Comparing RDF Stream Processing Semantics

The increasing popularity of RDF Stream Processing (RSP) has led to developments of data models and processing engines which diverge in several aspects, ranging from the representation of RDF streams to semantics. Benchmarking systems such as LSBench, SRBench, and CSRBench were introduced as attempts to compare different approaches. However, these works mainly concentrate on the operational asp...

متن کامل

Stream Reasoning

ion techniques like data filtering or summarizing are available and can be readily used. At the global level in a distributed setting, the main challenge is that global information such as the whole routing strategy or communication branches are invisible to nodes in the system. Each node merely knows parent and child nodes, and none has a clear picture of the system status to decide about the ...

متن کامل

PRSP: A Plugin-based Framework for RDF Stream Processing

In this paper, we propose a plugin-based framework for RDF stream processing (PRSP). With this framework, we can apply SPARQL engines to process C-SPARQL queries with maintaining the high performance of those engines in a simple way. Taking advantage of PRSP, we can process large RDF streams in a distributed context via distributed SPARQL engines. Moreover, we can evaluate the performance and c...

متن کامل

Heaven Test Stand: Towards Comparative Research on RSP Engines

The benchmarking of window-based RDF Stream Processing (RSP) engines has recently attracted the attention of the Stream Reasoning community. Solutions like LSBench, SRBench and CSRBench tried to fulfill the need of shared practices for RSP engine evaluations. However, an infrastructure for the systematic comparison of existing systems is still missing. In this paper, we propose the requirements...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.08496  شماره 

صفحات  -

تاریخ انتشار 2018